77 research outputs found
Allocating Divisible Resources on Arms with Unknown and Random Rewards
We consider a decision maker allocating one unit of renewable and divisible
resource in each period on a number of arms. The arms have unknown and random
rewards whose means are proportional to the allocated resource and whose
variances are proportional to an order of the allocated resource. In
particular, if the decision maker allocates resource to arm in a
period, then the reward is, where
is the unknown mean and the noise is independent and
sub-Gaussian. When the order ranges from 0 to 1, the framework smoothly
bridges the standard stochastic multi-armed bandit and online learning with
full feedback. We design two algorithms that attain the optimal gap-dependent
and gap-independent regret bounds for , and demonstrate a phase
transition at . The theoretical results hinge on a novel concentration
inequality we have developed that bounds a linear combination of sub-Gaussian
random variables whose weights are fractional, adapted to the filtration, and
monotonic
Competition among Parallel Contests
We investigate the model of multiple contests held in parallel, where each
contestant selects one contest to join and each contest designer decides the
prize structure to compete for the participation of contestants. We first
analyze the strategic behaviors of contestants and completely characterize the
symmetric Bayesian Nash equilibrium. As for the strategies of contest
designers, when other designers' strategies are known, we show that computing
the best response is NP-hard and propose a fully polynomial time approximation
scheme (FPTAS) to output the -approximate best response. When other
designers' strategies are unknown, we provide a worst case analysis on one
designer's strategy. We give an upper bound on the utility of any strategy and
propose a method to construct a strategy whose utility can guarantee a constant
ratio of this upper bound in the worst case.Comment: Accepted by the 18th Conference on Web and Internet Economics (WINE
2022
Revenue Maximization and Learning in Products Ranking
We consider the revenue maximization problem for an online retailer who plans
to display a set of products differing in their prices and qualities and rank
them in order. The consumers have random attention spans and view the products
sequentially before purchasing a ``satisficing'' product or leaving the
platform empty-handed when the attention span gets exhausted. Our framework
extends the cascade model in two directions: the consumers have random
attention spans instead of fixed ones and the firm maximizes revenues instead
of clicking probabilities. We show a nested structure of the optimal product
ranking as a function of the attention span when the attention span is fixed
and design a -approximation algorithm accordingly for the random attention
spans. When the conditional purchase probabilities are not known and may depend
on consumer and product features, we devise an online learning algorithm that
achieves regret relative to the approximation
algorithm, despite of the censoring of information: the attention span of a
customer who purchases an item is not observable. Numerical experiments
demonstrate the outstanding performance of the approximation and online
learning algorithms
Algorithmic Decision-Making Safeguarded by Human Knowledge
Commercial AI solutions provide analysts and managers with data-driven
business intelligence for a wide range of decisions, such as demand forecasting
and pricing. However, human analysts may have their own insights and
experiences about the decision-making that is at odds with the algorithmic
recommendation. In view of such a conflict, we provide a general analytical
framework to study the augmentation of algorithmic decisions with human
knowledge: the analyst uses the knowledge to set a guardrail by which the
algorithmic decision is clipped if the algorithmic output is out of bound, and
seems unreasonable. We study the conditions under which the augmentation is
beneficial relative to the raw algorithmic decision. We show that when the
algorithmic decision is asymptotically optimal with large data, the
non-data-driven human guardrail usually provides no benefit. However, we point
out three common pitfalls of the algorithmic decision: (1) lack of domain
knowledge, such as the market competition, (2) model misspecification, and (3)
data contamination. In these cases, even with sufficient data, the augmentation
from human knowledge can still improve the performance of the algorithmic
decision
Equilibrium Analysis of Customer Attraction Games
We introduce a game model called "customer attraction game" to demonstrate
the competition among online content providers. In this model, customers
exhibit interest in various topics. Each content provider selects one topic and
benefits from the attracted customers. We investigate both symmetric and
asymmetric settings involving agents and customers. In the symmetric setting,
the existence of pure Nash equilibrium (PNE) is guaranteed, but finding a PNE
is PLS-complete. To address this, we propose a fully polynomial time
approximation scheme to identify an approximate PNE. Moreover, the tight Price
of Anarchy (PoA) is established. In the asymmetric setting, we show the
nonexistence of PNE in certain instances and establish that determining its
existence is NP-hard. Nevertheless, we prove the existence of an approximate
PNE. Additionally, when agents select topics sequentially, we demonstrate that
finding a subgame-perfect equilibrium is PSPACE-hard. Furthermore, we present
the sequential PoA for the two-agent setting
Competition among Pairwise Lottery Contests
We investigate a two-stage competitive model involving multiple contests. In
this model, each contest designer chooses two participants from a pool of
candidate contestants and determines the biases. Contestants strategically
distribute their efforts across various contests within their budget. We first
show the existence of a pure strategy Nash equilibrium (PNE) for the
contestants, and propose a polynomial-time algorithm to compute an
-approximate PNE. In the scenario where designers simultaneously
decide the participants and biases, the subgame perfect equilibrium (SPE) may
not exist. Nonetheless, when designers' decisions are made in two substages,
the existence of SPE is established. In the scenario where designers can hold
multiple contests, we show that the SPE exists under mild conditions and can be
computed efficiently.Comment: Accepted by the 38th Annual AAAI Conference on Artificial
Intelligence (AAAI 2024
On the complexity of computing Markov perfect equilibrium in general-sum stochastic games
Similar to the role of Markov decision processes in reinforcement learning, Markov games (also called stochastic games) lay down the foundation for the study of multi-agent reinforcement learning and sequential agent interactions. We introduce approximate Markov perfect equilibrium as a solution to the computational problem of finite-state stochastic games repeated in the infinite horizon and prove its PPAD-completeness. This solution concept preserves the Markov perfect property and opens up the possibility for the success of multi-agent reinforcement learning algorithms on static two-player games to be extended to multi-agent dynamic games, expanding the reign of the PPAD-complete class
Deep Learning is Provably Robust to Symmetric Label Noise
Deep neural networks (DNNs) are capable of perfectly fitting the training
data, including memorizing noisy data. It is commonly believed that
memorization hurts generalization. Therefore, many recent works propose
mitigation strategies to avoid noisy data or correct memorization. In this
work, we step back and ask the question: Can deep learning be robust against
massive label noise without any mitigation? We provide an affirmative answer
for the case of symmetric label noise: We find that certain DNNs, including
under-parameterized and over-parameterized models, can tolerate massive
symmetric label noise up to the information-theoretic threshold. By appealing
to classical statistical theory and universal consistency of DNNs, we prove
that for multiclass classification, -consistent DNN classifiers trained
under symmetric label noise can achieve Bayes optimality asymptotically if the
label noise probability is less than , where is the
number of classes. Our results show that for symmetric label noise, no
mitigation is necessary for -consistent estimators. We conjecture that for
general label noise, mitigation strategies that make use of the noisy data will
outperform those that ignore the noisy data
- …